PCA Test

Principal component analysis (PCA) is the linear dimensionality reduction using Singular Value Decomposition (SVD) of the data to project it to a lower dimensional space. In this case, we are projecting an 11-dimensional matrix into 2D.

Import Dependencies



In [1]:

    
import numpy as np
import matplotlib.pyplot as plt

from sklearn import decomposition
from sklearn import datasets

import csv
%run 'preprocessor.ipynb' #our own preprocessor functions

Prepare Dataset



In [2]:

    
with open('data_w1w4.csv', 'r') as f:
      reader = csv.reader(f)
      data = list(reader)
    
matrix = obtain_data_matrix(data)
samples = len(matrix)

print("Number of samples: " + str(samples))
print("First entry: " + str(matrix[0]))









    



Number of samples: 176
First entry: [[2680 1 0 0 0 0 0 0 4.9481 72 5 0]]

Prepare the plot



In [3]:

    
fig = plt.figure(1, figsize=(10, 6))
plt.clf()

Do PCA



In [4]:

    
plt.cla()
pca = decomposition.PCA(n_components=2)
pca.fit(matrix)
X = pca.transform(matrix)

Plot the Data



In [5]:

    
plt.scatter(X[:, 0], X[:, 1], edgecolor='k')

plt.show()